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We propose a generalization of the Quantum Monte Carlo loop algorithm to the t-J model by 
a mapping to three coupled six-vertex models. The autocorrelation times are reduced by orders of 
magnitude compared to the conventional local algorithms. The method is completely ergodic and 
can be formulated directly in continuous time. We introduce improved estimators for simulations 
with a local sign problem. Some first results of finite temperature simulations are presented for a 
t-J chain, a frustrated Heisenberg chain, and t-J ladder models. 



I. INTRODUCTION 



Quantum Monte Carlo (QMC) methods are a powerful tool for the investigation of strongly interacting systems. 
They are easy to generalize and can therefore be applied to almost any model. In addition, they can be used for 
large systems and give unbiased results that are exact within given statistical errors. They are thus an ideal tool 
for numerical simulations of complex systems. A major problem however is that the results are not useful if the 
statistical errors become too large. This happens in many interesting cases. Classical local update Monte Carlo (MC) 
simulations near second order phase transitions suffer from "critical slowing down" : the autocorrelation time and with 
it the statistical errors diverge at the critical point. This problem has been solved for many classical spin systems by 
cluster algorithms, which construct global updates of large clusters instead of performing local spin flips. 

Recently., a. generalization of these cluster methods to quantum spin systems, the loop algorithm, has been 
developed.ETu For a review see Rcf . H This method can solve the problem of critical slowing down also for QMC simu- 
lations. It has made possible to investigate phase transitions in quantum spin systemspliij far beyond the possibilities 
of previous MC techniques. 

The loop algorithm can be generalized to particle models. The original loop methodBB can be applied directly 
to hard jcore bosons and to spinless fermions. A Hubbard model can be simulated by coupling two spinless fermion 
systemsJj One problem in QMC simulations of the Hubbard model is that its dominant energy scale is the Coulomb 
repulsion U ^> t, while the interesting low lying excitations are at a much smaller energy scale J = At 2 /U <C U. To 
investigate the low energy properties it is thus of advantage to simulate the effective low energy Hamiltonian, the 
t-J model. Erevious finite temperature simulations for the t-J model have been carried out both in a determinantal 
formulationEj in two dimensions (2D), which suffered from serious-sign problems and metastability, and in the worldlinc 
formulation in one dimension (ID), with standard MC updates.Eil As we will show explicitly later, such standard MC 
simulations suffer from strong autocorrelations which seriously limit the accessible system sizes and temperatures. 
They are also non-ergodic, and like the determinantal simulations have to be extrapolated to continuous imaginary 
time. 

In the present paper we present a loop algorithm for the t-J model (for any dimension), which overcomes these 
autocorrelation problems and has additional advantages such as complete ergodicity, the existence of improved esti- 
mators, which further reduce the error of measured quantities by implicitly averaging over many configurations, and 
the possibility of directly taking the continuous time limit. Some of our results have already been presented in Refs. 
p2j , p3j An apparently related approach has very recently been used for 2D simulations in Ref. ^J. 
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Quantum Monte Carlo simulations of fermionic models or of frustrated spin systems nearly always suffer from 
the "negative sign problem" . In order to perform QMC simulations we first have to map the quantum system to a 
classical one. This mapping can introduce negative weights, which cause cancellation effects. The statistical error for 
a given amount of computational effort can then increase exponentially with system size and inverse temperature. 
This severely restricts simulations of higher dimensional fermionic models. By combining loop updates with improved 
estimators we can reduce the variance of the observables and thus lessen the sign problem. 

This paper is organized as follows. First we review the worldline QMC algorithm and the standard loop algorithm 
for a Heisenberg chain. In Sec. Ill we describe the loop algorithm for the t-J model. The use of improved estimators 
is discussed in Sec. IV. Finally in Sec. [V] we discuss the performance of the new algorithm and show some first 
results obtained for a t-J chain, a frustrated Heisenberg chain, and t-J ladder models. In the appendix we discuss 
the continuous time version of our method. 



II. BACKGROUND MATERIALS 

To establish notation and formal background we briefly describe the worldline representation and tlie.standard loop 
algorithm.|-||iyft,refer to the literature for more detailed descriptions of the worldline representationcJ and the loop 
algorithm.au'tM As an example, we take the ID Heisenberg antiferromagnet. The Hamiltonian is defined by 

L L 

H H = Y J H {€ > =Y,JSi-S i+1 , (1) 

i=l i=l 

where Si denotes a spin- 1/2 operator on site i, J > for the antiferromagnet, and the periodic boundary condition 
Sl+i — Si is adopted. 

A. Worldline representation 

We use the Trotter-Suzuki decompositiorS and a path-integral formulation in imaginary time. The Hamiltonian 
H is decomposed into two terms H — H cvcn + i? ddj each of which is easy to diagonalize. Then 



Z = Tre _/3H = lim Tr 

M—>oo 



E 



^ e -Ar(H 0V cn + ff odd )^ = Tf ^ ( p - Ar^eve^- Arff odd ) ^ + q ( Ar 2) 

i l \e~^ H ^\i 2M ){i iM \e-^ H o^\i iM _ 1 ) ■ ■ ■ <* 3 |e- A ^™%)< i2 |e- A ^odd| n ) + (At 2 ) , (2) 



where At — (3/M, and M is called Trotter-number. The summation with respect to \ik) is taken over complete 
orthonormal sets of states. 

We may consider Eq. (g) as the evolution of the initial state in imaginary time with one application of the time 
evolution operator within a time step At. The partition function Z in Eq. (||) is also formally the partition function 
of a (d+1) dimensional classical system. The systematic error of order At 2 due to the finite time step approximation 
can be extrapolated to At 2 — ► by fitting to a polynomial in At 2 . The loop algorithm can also be formulated directly 
in the continuous time limit At — ► 0.E3 This will be discussed in appendix |A| 

The decomposition of the Hamiltonian has to be chosen according to the problem. For our ID system with only 
nearest neighbor interaction we take -ff C vcn/odd = Si even/odd-^ > leading to a checkerboard structure as shown in 
Fig. [I]. The Hamiltonian acts only on the shaded plaquettes p in Fig. |l|, each of which contributes a factor w p to the 
matrix elements in Eq. (^|), i.e., 

z = E LK^E^ C )- ( 3 ) 

i\,---,iiu P {C} 

For the Heisenberg model we expand the states \ik) in an S z eigenbase. Each H"' = Si ■ Si+i conserves magnetiza- 
tion. Therefore there are only six nonvanishing matrix elements for each shaded plaquette, which can be represented 
by solid and dashed lines connecting up and down spins, respectively, as shown in Fig. |^. Thus the sum in Eq. is 
taken over configurations C = {\ik)} of continuous worldlines. One example is shown in Fig. ^ Note that C can be 
identified with a set of binary variables Sf = ±1/2, each defined on a site in the checkerboard, with the restriction 
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space direction 



FIG. 1. Example of a world line configuration of the Heisenberg model with the checkerboard decomposition. The imaginary 
time r runs along the vertical axis, and the real space direction along the horizontal axis. The solid lines represent up-spins, 
the dashed lines down spins. The shaded plaquettes show the application of e~ ATH< " ' . 
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FIG. 2. The six allowed plaquette states of the Heisenberg model that fulfill the magnetization-conservation condition. 
The second row shows the weights of the plaquettes for the Hamiltonian (Eq. [l]). The solid lines connect two sites occupied 
by up spins, and the dashed ones connect down spins. (We have assumed a bipartite lattice and rotated the spin-operators 
gx,y _gx, y ^ ma k e t ne weight of the last two plaquettes positive.) 



due to the magnetization conservation. It is convenient to define C p as the local state of a given shaded plaquette p, 
namely, a set of four binary variables at its four corners. The configuration C is then identified with the union of the 
plaquette states. Accordingly, w p in Eq. (|^) can be written as w(C p ). 
Thermal averages of observables O can be written in a similar way as 

(0) = ^W(C)0(C), ( 4 ) 

{C} 

where 0(C) is the value of the observable in the configuration C. 

If the weight of a configuration W(C) can take negative values, one has to use its absolute value |W(C)| to construct 
the probabilities for the Markov chain of a MC procedure (see below), since these probabilities need to be positive. 
Expectation values are then given by 

E {C }W(C)Q(C) J2 {c} \W(C)\ S ign(C)0(C) 
{ } Z £ {C }l^(C)|sign(C) 

E {c} |w(c)|sign(C)0(C) 
= ^ {C} ™ = (sign-Q)| W | 

I] {c} l^(e)|sign(c) (sign) |W | ' 

where sign(C) stands for the sign of W(C), and (• • -}\w\ denotes expectation values with respect to the absolute value 
of the weight W. 

In many cases, a "sign problem" now stems from the fact that the average sign, (sign.)i^i, may decay exponentially 
with increasing system size and inverse temperature (3. For fixed computational effort this then leads to an exponential 
blow-up of the errors. 
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B. Local worldline algorithms 



The thermal averages Eqns. (|, g) can be taken by MC importance sampling. One constructs a sequence (Markov 
chain) of configurations C'*' such that in the limit of infinitely many configurations their distribution agrees with the 
correct Boltzmann distribution p(CW) = W(C^)/Z. 

This can be achieved by satisfying two conditions: ergodicity of the Markov chain, and detailed balance 

p(C C) W{C) ' 1 ' 

where p(C — > C) is the probability of choosing the configuration C as the next configuration in the Markov chain, 
when the current configuration is C. 

Then the thermal expectation value Eq. (Q) of an observable O can be estimated by averaging the value of the 
observable in the configurations CW; 

_ _ 1 N 

(0)= lim O, 0=^0(C«). (7) 

i 

In cases with a sign problem the averages in Eq. (|5|) are done separately for the numerator and for the denominator: 

1 t — < n . s . \ ( 7. 1 V ' 



iEr=jv( si g n )wi 



In standard local algorithms an update from one configuration C to the next one is done by proposing a new 
configuration C that differs from C by a small local change of the world-lines. The candidate C is accepted with a 
probability that satisfies detailed balance, e.g. the Metropolis probabilitytH 



or the heatbath probability 



* c - c '>= w(c)fw ( c>) ■ m 



otherwise the configuration C is kept. 

There are two major problems with local updates: Firstly, consecutive configurations are strongly correlated. It 
takes on average a number r of updates to arrive at a statistically independent configuration. This autocorrelation 
time r, which depends on the measured quantity O, typically increases quadratically with spatial correlation length 
£ and inverse energy gap A -1 (resp. system size L and inverse temperature j3 when £ > L or A -1 > /?). To achieve a 
desired statistical accuracy, the MC simulation has to be lengthened by a factor t, which can easily reach orders of 10 6 
and larger in practical cases. Secondly, local updates are not ergodic. For example, when applied to the Heisenberg 
model, they cannot change total magnetization nor spatial winding number. Many quantities of physical interest, like 
the superfluid density, are then very difficult to estimate. In addition, it was pointed outtl that a complicated quantity 
exists that does not vary in conventional local updates for the XYZ model. To make the conventional algorithm 
ergodic, therefore, we usually have to include some ad hoc global updates, which tends to make the resulting code 
rather cumbersome. Also, the acceptance rate of such ad hoc global updates is often very small, which is another 
cause of long autocorrelation times. 



C. Loop algorithm for the Heisenberg model 

Both kinds of difficulties are overcome in the loop algorithm which achieves large nonlocal configuration changes in 
one stochastic update. Autocorrelation times for the loop algorithm are found to be orders of magnitude smaller than 
those for the conventional algorithm. In addition, Jl does not suffer from the above-mentioned ergodicity problems, 
and can be formulated directly in continuous timellj 

In the loop algorithm, each update consists of two steps, both stochastic. In the first step, the current worldline 
configuration C is mapped with a probability p(C — > G) to a graph-configuration G. G — {G p } consists of local 
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graph segments G p denned on the plaquettes p, which combine to form a set of closed loops. In the second step, the 
configuration of loops is mapped with a probability p{G — > C') to a new worldline configuration C' . 

Let us explain the simple case of the Heisenberg model. Two observations are important: (1) Since the worldlines 
are continuous, the difference between two arbitrary worldline configurations (in the sense of an exclusive-or) , i.e. the 
location of spin-flips in any allowed update of a configuration, is located on a set of closed loops. These are the loops 
we will construct. Flipping all the spins on a closed loop will be called a loop flip. In Fig. |^ we show an example for a 
loop-update step for the Heisenberg model. (2) Since the Hamiltonian acts locally, the partition function Z in Eq. (||) 
is a product of plaquette terms. We can therefore fulfill detailed balance separately for each plaquette, provided the 
global constraint of closed loops is satisfied. 




(a) (b) (c) 

FIG. 3. Example of a loop-update step for the Heisenberg model (anisotropic case, which has finite probability for diagonal 
graph segments). On the left (a) we show an initial configuration C of up-spins (dashed lines) and down-spins (solid lines) 
which is mapped to a configuration G of loops in the middle figure (b). Some of these loops are selected (with probability 1/2) 
to be flipped, i.e. the spins along these loops change direction. We denote the loops that will be flipped by dashed lines, the 
unchanged loops by solid lines. The figure on the right (c) shows the spin configuration after the loop-flips. 

By inspecting the six allowed local states C p on a plaquette (Fig. ||), we see that for each plaquette, spin-flips must 
occur on pairs of sites, not on single spins, in order to arrive at another allowed local state. We connect the pairs of 
sites on which spins are to be flipped together by solid lines: these are loop segments. Since there are several possible 
pairings of sites, the lines can in principle run horizontally, vertically, or diagonally. Also, all four sites can be flipped 
simultaneously without violating the restriction. The symbol G p stands for the two loop segments on plaquette p. 
We will speak of G p as the graph on plaquette p. The union G = U P G P constitutes a complete graph configuration 
G. We call the graph G p in which all 4 spins are grouped together the "freezing" graph, since (without symmetry 
breaking field) the flip of all 4 spins will leave the plaquette weight w p invariant. For a given plaquette configuration 
C p only certain graphs G p are possible, namely those for which an update along connected points leads to another 
allowed plaquette configuration C' p (i.e. w(C p ) ^ 0). We define a function A(G pi C p ) so that it takes the value 1 when 
G p is allowed for a given C p and the value otherwise. It is shown in Fig. |4|. Each spin belongs to two interacting 
plaquettes. It belongs to one loop segment on each of these plaquettes, except for the "freezing" graph, in which all 
four sites are connected. Specifying G p on each interacting plaquette individually will thus automatically lead to an 
overall configuration of closed loops. Therefore, we only need to specify the probabilities p(C p — ► G p ) and p{G p — > C' p ) 
for each interacting plaquette individually. If we additionally have "freezing" graphs G p on some of the plaquettes, 
then the loops passing through these plaquettes have to be flipped together. They are "frozen" into one loop-cluster. 
This freezing is problematic if it occurs too often since then the whole lattice might just "freeze" and no change of 
weight is done by the update. Thus we want to avoid unnecessary freezing. 

We may construct loop algorithms such that loops are flipped with probability 1/2 when there is no symmetry 
breaking field, as is the case with general XY Z quantum spin systems without magnetic field. To include a symmetry 
breaking field, we factorize the local Boltzmann weight in the form 

(c P ) (ii) 

where wq{C p ) is used for defining the probability of choosing G p , whereas w a symnr(Cp) is taken into account in terms 
of the flipping probability of the loop. The weight wq(C p ) needs to be invariants under flip of all four spins at the 
plaquette p. Using this factorization, the probability p(C p — > G p ) is constructed as follows. First we choose weights 
v(Gp) for all graphs G p such that 
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FIG. 4. Plaquette configurations C p and graphs G p for the anti-ferromagnetic Heisenberg model. The upper part of the 
figure specifies the graphs G p and one solution v(G p ) of equation (|l^) for their weights. There is a free parameter e in this 
solution. If e is chosen zero, no freezing and no diagonal graph segments will occur. The third row shows the the continuous 
time limit of v(G p ) (see appendix [X]). The lower part of the figure shows the spin configurations C p and their weights, and the 
function A(G P ,C P ) which specifies whether a configuration C p and a graph G p are compatible. 



5>(G P )A(G P ,C P ) = w (C p ). (12) 

Gp 

One solution to this set of equations is shown in Fig. ^. (The solution is in general not unique; depending on H, it 
may also not exist.) Then, detailed balance for the overall update C — > C is fulfilled by 

viGp±MGM . , r , r ,, Y[ p w &m {C' p )A(Gp,C'p) 

p(L p — > Lr p ) — — — , p(U — > L ) — — j—r , (L6j 

W (Cp ) [ [ p lOasymm (Cp) + H p W asymm (C' p ) 

as can be checked easily. In general, p(G — > C) needs to satisfy detailed balance with respect to the weight 
Ftp w asymm (C p ) A (G p ,C p ). Here we have chosen a heatbath probability like Eq. (|To|) . 

The construction of the loops can be performed in a multi-cluster scheme. In this case, a graph G p is chosen on 
all plaquettes p, and we obtain a unique partitioning of the lattice-into ni loops Z,-. Then we attempt to flip all loops 
Zj according to Eq. (|i~3|). We can also use a single-cluster variant .c3 We can think of this variant as picking a single 
cluster Zj of the above partitioning with a probability p(|Zj|) according to the size of the loop |Zj| = X^sitc (Tj)eU ^' This 
loop is then flipped with respect to weights w — Yip^u w asymm(C p ) and Eq. (|9[), which means that we will always flip 
that loop if w = 1 for all plaquettes. In an implementation of this algorithm we construct this single loop by picking 
randomly any site of the lattice and building a single loop by choosing graphs G p only on the plaquettes along the 
path. Hence we need an effort only proportional to the length |Zj|. 



III. LOOP ALGORITHM FOR THE t-J MODEL 



The t-J model is defined by the Hamiltonian 

H = -t ^ X! I/ 1 ~ n 3-<?) C j,° Cl ' a ( l ~ n i-°) + hx - + J X! _ ^ n « n i) ' ( 14 ) 

<i,j> o <i,j> 

where c| CT creates a spin-1/2 fermion with z-component of spin a at site i, n^ a — cj a c% t a aud rii = ^Za n i^- ^ nc 
projection operators (1 — n,j t - a ) prohibit double occupancy of a siter. The brackets < i,j > denote nearest neighbor 
pairs. The t-J model can be represented in a worldline formulationc 1 ] in terms of variables which take three possible 
values, 0, +1 and —1, representing a hole, an up-spin, and a down-spin, respectively. The matrix elements for the 15 
different plaquettes with nonzero weight are given in Fig. |^. There are several sources of negative signs in the overall 
weight W(C) = \W(C) \ sign(C) of a configuration. For the t-J model they all stem from anticommutation of fermion 
operators. One example is the sign in the third line of Fig. ^. The overall sign can be decomposed as 
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FIG. 5. The 15 different plaquettes C p with nonvanishing weight w(C p ) for the t-J model. Up spins are denoted by a solid 
line, down spins by a broken line. The sign of w(C p ) will be taken into account according to Eq. 



-1 for periodic and b c 



(15) 

-1 for antiperiodic 



sign(C) = (-l)-w (b c ) n — 

where ri pcrm is the number of permutations of fermion worldlines, b c 
boundary conditions, and Ubound i s the number of particles hopping across the boundary. For constructing loops we 
will use the absolute value of the weight, Eq. (|TT|) . The sign will be taken into account in the MC simulation according 
to Eq. (H). It will also play a role for the improved estimators treated in section [V. 



In the last section, we have seen how a loop algorithm is constructed for a model with binary variables. In order 
to construct a loop algorithm for the t-J model we now reduce the problem with trivariate variables into three sub- 
problems with binary variables. To this end, we divide a MC step into three substeps. In substep I, variables with 
the value 0, namely holes, are left unaffected (inactive) while attempts are made to flip all the variables with values 
+1 and —1 (active variables). Similarly, in the second and the third substeps, we keep variables with the values +1 
and — 1, respectively, unaffected. Therefore, we deal with a binary problem in each substep. To each of these binary 
problems, we apply the idea of the loop algorithm. We denote as "active plaquettes" those on which all 4 variables 
are active. On the active plaquettes, the resulting algorithm for substep I is identical to the loop algorithm for the 
5 = 1/2 antifcrromagnctic Heisenberg model, while the algorithm for substeps II and III turns out to be the loop 
algorithm for the S — 1/2 XY model (which is the same as that for free fermions), as we will see below. The flipping 
probabilities of the loops are of course affected by the inactive plaquettes. 

Since we have three different binary problems, we need to construct three loop algorithms with the second and 
the third ones being transformable into one another simply by interchanging the roles of the values +1 and —1. The 
detailed balance condition holds for each of the three substeps, whereas ergodicity is achieved by the combination of 
them. We have ample freedom in choosing a set of graphs and graph weights. It is, however, advantageous for the 
computational simplicity and the reduction of autocorrelation times to choose a scheme such that the resulting loops 
may be flipped independently in a multi-cluster variant (for a different choice see Ref. ^3|). Therefore, we must have 
weights w asymm (C p ) = 1 on the active plaquettes, i.e. those where two loops may be flipped. This can be achieved by 
letting the loop updates deal only with the active plaquettes. The weights of the other plaquettes are put into the 
global weight function w asymm : 



w {C p ) 



w(Cp) if all four variables on the plaquette are active 
1, otherwise 



> ^mm(Cp) - 1 w ^, ^ , l( ] 1( , nv j so 



1, if all four variables on the plaquette are active 

■"P)\ 

In this case we can flip all loops independently with the flipping probability for a loop 

Ilpeloop U W &syn 



Pm P (k 
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asymm(Cp) + Ilpeloop l ( w asymm{C' p ) 



(16) 



(17) 



where C' p denotes the plaquette state after the flipping. 

Let us consider now in detail the probabilities in the algorithm for substep I in which variables with the value 
(holes) are inactive and kept unaffected. The algorithm is equivalent to the one for the 5=1/2 antifcrromagnetic 
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FIG. 6. Plaquette configurations C p and graphs G p for substop I (flip spin-up <-» spin-down) of the t-J model. The upper 
part of the figure specifies the graphs G p and the freezing-less solution v(G p ) of equation (|l2|). The lower part of the figure 
shows the spin configurations C p and their weights, and the function A(G P ,C P ) which specifies whether a configuration C p and a 
graph G p are compatible. The first six configurations C p and the solution v(G p ) restricted to these configurations correspond to 
the case of the antiferromagnetic Heisenberg model. The open circles in the diagrams in the top row represent active variables 
whereas solid circles stand for inactive ones. (For the plaquette configurations C p in brackets, the corresponding graph G p given 
in the figure has to be flipped spatially.) 



isotropic Heisenberg model as far as the plaquettes with only active variables are concerned. As for the plaquettes 
with inactive variables, a unique graph is assigned to each of them such that active variables, if any, are connected 
to each other (see lower part of Fig. ||). It is easy to verify that the graph weights v(G p ) shown in Fig. || satisfy the 
weight equation (|l2|). In this algorithm, the weights w asymm (C p ) remain unchanged upon flipping of a loop because 
of the spin-inversion symmetry of the Hamiltonian. Thus we obtain a loop flipping probability of 1/2. 

Next, we consider the algorithm for substep II (or equivalcntly substep III), where all down-spins are kept unchanged. 
This time, the algorithm on the active plaauettes is equivalent to the one for the S = 1/2 XY model (since that is 
the same as the algorithm for free fermionsa) rather than that for the antiferromagnetic Heisenberg model. Again, a 
unique graph is assigned to each plaquette with inactive variables such that any active variables are connected. The 
graph weights v(G p ) are shown in Fig. [?]. Contrary to substep I, we have to calculate the flipping probabilities of the 
loops according to Eq. (|l7|), since there is no symmetry similar to the spin- inversion symmetry in the first algorithm. 

In contrast to the conventional local-update worldline algorithm, simulations can be performed in either the canon- 
ical or the grand canonical ensemble, with either constant or variable magnetization in the present method. A change 
in the particle number or the magnetization results from loops that wrap around the lattice in temporal direction one 
or more times. If the particle number or the magnetization should be fixed, we can simply disallow flipping these loops 
without violating the detailed balance condition. Since the loop algorithm is no longer restricted to the subspace of 
a constant spatial winding number, a negative sign may appear also for the ID t-J model. However, here the sign 
problem is not really a difficulty because it becomes less significant as the system size becomes larger. It can also be 
avoided if one chooses the subspace of constant winding number. 



IV. IMPROVED ESTIMATORS 

Let us now discuss "improved estimators" .El They reduce the error of measured quantities by implicitly averaging 
over many configurations. In a MC simulation we construct, with the loop algorithm, a series of i = 1, N config- 
urations C^'. In the first step of each loop update we define a graph C?W which consists of a set £W f loops. 
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FIG. 7. Plaquette configurations C p and graphs G p for substep III (flip spin-up hole) of the t-J model. The solution for 
substep II is equivalent. The upper part of the figure specifies the graphs G p and the freezing-less solution v(G p ) of equation 
(|l2|). The first six configurations C p and the solution v(G p ) restricted to these configurations correspond to the XY-model (free 
hardcore bosons). See also Fig. [| 



From this graph we can reach any member of a set of 2™ worldline configurations by flipping a subset of the 
loops. The probability p(C') for each of the configurations C £ is determined by the loop flip probabilities pmp- 
In the second step one configuration Cv»+ 1 ) w m then be chosen randomly according to these probabilities. 

An improved estimator Oi mpr for the expectation value Eq. (^) can be constructed by averaging over the value in 

each of the 2™ states C G rW that can be reached from the state Cw, instead of measuring only the value in one 
state C (i) : 

_ 1 N 

(0) = (O impT ), O impT = 2 0(C')p(C), O impr = -^O impr , (18) 

cerw »=i 

where the probability p(C) of the configuration C' can be calculated as a product of the loop flip probabilities pmp- 
(Actually, we can choose some probability pjj ip here that is different from the flip probability used in the MC updates; 
it just needs to satisfy the same detailed balance requirement as paip. Thus there is actually a large variety of improved 
estimators available.) 

To really gain an improvement we need to calculate the average over 2 n< ' states in a time comparable to the time 
needed for a single measurement. Fortunately that is possible. Particularly simple improved estimators can often be 
found in the case that pfu p = ^ for all loops. In that case the above estimate simplifies to 

N 

Oimpr ^E 2 ^ E °( C '), (19) 

as all of the states in now have the same probability 2~ Tl< ' . 

Even if the loop flip probabilities are not all equal, we can still choose a pjj ip such that some loops have p^ ip = 5, 
while the other loops are fixed in a certain state. There are many possibilities to do that. We have chosen to fix the 
state of a loop with a probability of 
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p fix = |2p flip - 1|. (20) 

If Pfiip < 5, the loop is fixed with probability pa x in the old state and if pfa p > ^ in the flipped state. The spins on 
the fixed loops are treated just as the inactive spins. The remaining set !F'^> of n'W < loops can then be flipped 
with new probabilities p' Rip = \ . 



A. Simulations without sign problem 

Let us show two examples of improved estimators for the simple case of substep I of the t-J algorithm (or for the 



Heisenberg antiferromagnet). We provide a more detailed discussion in appendix [B[ From Eqns. ( Jig ) and (B4), , the 
improved estimator (multiplied by 4 for convenience) for the spin correlation function at momentum 7r is 

„ _ ,™ QZ , JO, if the spins are on different loops > 

U impl = 4(5 r Tl b r , iT ,) impr = < [1 . i ^ sping Qn same bop ^ (21) 

Remarkably, the locations of the loops thus corresponds to the spin-spin correlation function. The potential gain 
from using improved estimators is easy to see in this case. 0i mpr takes only the values and 1. Yet it has the same 
expectation value as the unimproved estimator O = 4S^ T S^, T , — ±1. When (O) is small (e.g. (0) ~ exp (— r/£) at 
large r), then the variance of O is 



(O 2 ) - (O) 2 = 1 - (O) 2 « 1 , (22) 



whereas the variance of 0i mpr is 



(Of mpr ) - (O impr y = (0 lmpr ) - (O impr y « (0 impr ) = (O) « 1 . (23) 

For a given distance r, the gain from using the improved estimator appears largest at small correlation length £, 
whereas the gain from reducing autocorrelations with the loop algorithm is largest at large £. Using the improved 
estimator can therefore reduce the variance, and thus the computer time required for a given accuracy, by a large 
factor. The non-improved estimator may, however, have a sizeable amount of self-averaging from summing over all 
lattice sites, which can cancel part of this gain. 

An especially simple estimator can also be derived for the uniform magnetic susceptibility (x) — 

(mi Er, T S r,r) ) h Y usin S 



£^E^= E E ^ = ^^), (24) 

T i" (loops V) ((t,t) in /) loops / 

which gives (x) = (Ximpr) simply as the the sum of the square of the temporal winding numbers u>t(l) of the loops I: 

Ximpr = tM J2 w t {lf ■ (25) 

loops / 

Here V is the number of spins in the lattice, D is the number of terms in the Trotter decomposition (D = 2 for a 
nearest neighbor chain) and M is the number of Trotter time slices. Thus VDM is the total number of spins in the 
classical d + 1-dimensional lattice. In the single cluster variant, the sum over the loops in Eq. ( pE| ) is also calculated 
stochastically. Since there we pick a single loop I with a probability \l\/(VDM) proportional to its size |Z|, we have 
to compensate for this extra factor and obtain: 

/ \ g 2 p 2 B l3 ,DM 2 

(x) = — 4 — (~\[\~ Wt ( l > >■ ( 26 > 

The improved estimators for more general spin and charge correlations are derived in appendix |b|. 
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B. Simulations with a sign problem 



In the case of simulations with a negative sign problem, expectation values have to be computed according to 
Eq. (0). Improved estimators can again help here, as they reduce the variance, and thus the error of the sign. 
Let us restrict ourselves to the case of the t-J model on a single chain, for which 

sign(C) = (b c ) n * ((-l)^t-i)«« ( _ 1} AW 5 (27) 

or to (possibly frustrated) spin models on any lattice, for which 

sign(C) = (-1)^ . (28) 

Here b c = +1 for periodic and b c = — 1 for antiperiodic boundary conditions, N tot is the total particle number, n x 
denotes the number of particles hopping across the boundary and N neg are the number of plaquettes with negative 
weight. In the canonical ensemble, we can decompose the sign as 

sign(C) = II s ( C p)' ( 29 ) 

plaquettes p 

where the product extends over all plaquettes of the lattice. When 6 C (— l)^* * -1 = —1, the sign of a plaquette is 
defined as 

-1 if for the t-J model either w(C p ) < or a particle hops across the border, but not both, 
s(C p ) = i —1 if for a spin model w(C p ) < (30) 
1 otherwise 

to satisfy Eq. (§|). (When ^(-l)^*" 1 = 1 it can be defined similarly). 

Improved estimators can be formulated for the sign if we can express it as a product of signs of the loops: 

sign(C«) = So II si S n ( Z ), (31) 

where sq is the sign of the plaquettes that contain only inactive spins. We consider only the case where the flipping 
probabilities are all pmp = 1/2, since especially simple improved estimators are available there: 

(sign(CW)) impr = s • 2"" W H (sign(Z) + sign®). (32) 

This estimator is zero if at least one loop changes its sign when it is flipped. Again this is a very simple estimator. 

The signs of the loops are constructed in the following way: If a plaquette contains only inactive spins, its sign 
contributes to so- If only one loop threads the plaquette, the sign of the plaquette is assigned to that loop. The 
only non-trivial case is where two loops thread a plaquette. We denote these loops by "1" and "2". In that case the 
sign of the plaquette s(C p ) has to be divided into two parts s(C p ) — sc_,g_(1) ■ sc p ,g p (2), depending on the graph G p 
chosen for the plaquette. This must be done in such a way that if one or both of the loops are flipped, the products 
sc p ,G p (T) • s Cp ,g p (2), sc P ,G P (l) • s Cp: G P (2) and s Cp ,G P (T) • s Cp:Gp (2) are equal to the sign of the plaquette with the spins 
on the corresponding loops flipped. (The bars denote flipped spins on that part of the plaquette.) In table || we show 
a solution for sq g p (p) for substep I. In the grand canonical ensemble we get additional sign changes from changes 
of iV t n t in Eq. (|27|). They can be assigned to the loop whose flip causes Atot to change. Overall, an assignment like 
Eq. ([32]) is possible for all three steps of the loop algorithm for the ID t-J model. For frustrated spin problems, signs 
appear only in single-plaquette weights and can therefore always be assigned similar to table |. 

If we consider different geometries for the t-J model, such as ladder systems or higher dimensional lattices, then 
sign(C) gets additional contributions from winding of fermion worldlines inside the boundaries. In this case we can 
still use the improved estimators constructed above for the updates of substep I, since the corresponding spin flips 
do not change the fermion winding number. In substeps II and III more complicated improved estimators can be 
constructed, at least for summing measurements over some of the possible loop flips. However, it is probably sufficient 
to have an improved estimator for one of the substeps, in order to already obtain most of the possible reduction in 
variances. 

In a similar way we can also measure equal time particle-particle and spin-spin correlations, the magnetic suscep- 
tibility and other observables. As an example we present the improved estimator for the uniform susceptibility in 
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TABLE I. Assignment of signs sc p ,G p (p) for substep I of the t-J model (or for the Heisenberg model, when the bipartite 
transformation is not done). The values of sc p ,a p (p) for the plaquettes not in the table are all equal to one. 





Gp 


SC P ,G P (1) 


S C P ,G P {1) 


sc p ,c p (2) 


sc p ,a p (2) 




o - 
2 


-1 


1 


1 


-1 


1 ! I 1 

U LI 


° i 


1 


-1 


1 


-1 






1 


1 


1 


1 


□ □ 


1- 1- 


1 


1 


1 


1 



substep I of the algorithm (or for pure spin models) . If no loop changes the sign of configuration C*M upon flipping, 
then we have 

( S ign-x)S pr = ^s ign (CW) J2 ^) 2 - (33) 

loops 

If exactly two loops I and V change the sign, it is 

(sign • x )« pr = ^l s ign(C^)w t (l)w t (l'), (34) 

and it is zero if one or more than two loops change the sign. As this improved estimator requires to know the sign 
change of a set of loops (all those whose flip is considered in constructing the improved estimator), a multi cluster 
algorithm is advantageous. The improved estimators for more general correlation functions are derived in appendix 



V. RESULTS 



We will now discuss the performance of the new algorithm by comparing the errors and autocorrelation times of the 
local update method and the loop update with and without improved estimators. We will consider four examples: a 
single t-J chain, two coupled t-J chains, and three coupled t-J chains (these are the first MC simulations for coupled 
t-J chains), and a frustrated Heisenberg model on a single chain. 



A. Autocorrelation times 



We have determined the integrated autocorrelation times r® t of our new algorithm applied to a t-J chain. Let us 
first give the details on how we have calculated these times. Let O^' be the estimate of the observable O in the i-th 
step of our MC procedure. It can be either the simple estimator or the improved estimator. As usual, we estimate 
the value of an observable O by Eq. (j7j) as an average over these N measurements. (Similarly for the nominator or 
denominator of Eq. (||).) The error of the estimate is a/y/N — 1, where 

a 2 = 2Tg t (W-0 2 ). (35) 
The autocorrelation time r^ t is given by the autocorrelation function T(t) 

{0 (to + t) (t o)) _ {O (t 0+t)) . {oM) 
[ ' (£>(*o)0(to)) _ (£)(*o)) . (£)(*o)) ■ [ > 



as 



' int 



oo 

= ^ + E r W- (3 7 ) 
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In the MC simulation, we have calculated T® t by grouping the N measurements into n bins of length / = N/n and 
computing the bin averages Ob (I) — j Ylj=(b-i)l+i 0^\ 6 = 1, n. Then we have calculated the variance of these 
averages Ob (I) of bin lengths I: 

1 ™ 

°^ 2 = 7E(^)-0) 2 (38) 

n — 1 * — ' 

6=1 



and the autocorrelation time can be estimated as0 

-fiw - (30) 



whose expectation value becomes equal to Tint given by Eq. ( p7j ) in the limit of I — > oo. As a function of increasing 
bin length i, T^ t (Z) is generally a non-decreasing function. When statistical independence is approached, the increase 
ceases and the expectation value of rP t (l) approaches a constant value. (Note that with a finite number N of 
measurements, the estimate for r^J. will fluctuate increasingly when I is increased.) The asymptotic constant value 
was our estimate for r® t . We have taken bin lengths I — 1, 2, 4, 8, . . .. 

For a comparison of the autocorrelation times T® t we also need to give our definitions of "one MC step" for both 
algorithms. In the conventional plaquette flip algorithm, the lattice is subdivided into four sublattices, which allow 
the simultaneous modification of all sites of the sublattice. In a single MC step we sequentially attempted to update 
all sites of a sublattice, which is generally called one "sweep" over the lattice. For one sweep with the loop-algorithm, 
we chose one of the three steps I, II and III at random and chose graphs G p for all plaquettes. This results in a 
complete decomposition of the lattice into loops. We then attempted to flip each loop. All our simulations were 
done in the canonical ensemble. In general we have performed M — 2.5 • 10 6 MC steps for the loop algorithm and 
M = 30 • 10 6 to 70 • 10 6 MC steps for the plaquette flip algorithm. Despite these very long simulation times we found 
cases (only for the conventional algorithm), where TP t (l) keeps increasing as a function of the bin size I. In these cases 



we took Tj^ t (/) as a lower bound for Eq. (37) with the largest I where a statistically reliable estimate is still possible. 



Since the value of depends strongly on the observable O, we have calculated Tint for the internal energy, the 
static charge-charge correlations 

1 L 

S d k ) ^^^(Kt + ^J^t + M)), (40) 



L 

3,m 



and the spin-spin correlations 



L 



Ss(k) = jJ2e tkU ~ m) (S*S z m ). (41) 



In Fig. |^ we show rP t as a function of the inverse temperature [3 for a single t-J chain. We have performed 
simulations on a smaller lattice, with L = 16 sites and Art = 0.25 and on a larger lattice with L = 64 sites and 
Art = 0.125. All measurements have been performed at quarter band filling and for ratios of J/t — 1 and 2. 

For the loop algorithm we obtained values of r; n t between one and 15 for all observables and all parameter values. 
Especially there is no significant increase of r; n t with increasing [3. This is in contrast to the conventional algorithm, 
where T lnt is between 100 and 1000 for the internal energy, but even larger than 10000 for the spin-spin correlations. 
We have to point out here that the values of T- lrA for the larger lattice with the conventional algorithm are most 
likely poor lower bounds of the real autocorrelation times T mt , since we have not been able to reach a plateau for 
T int(0 in these cases. Obviously the loop algorithm is especially effective in reducing the autocorrelation times for the 
spin- spin correlations. Thus our new algorithm for the t-J model works successfully, saving orders of magnitude in 
computational effort. 



B. Improved estimators 



Next we show some results and the effect of improved estimators for the t-J chain, obtained with the multi-cluster 
algorithm. The results of the measurements can be seen in table [n|. We have considered different correlation functions, 
such as the spin-spin and charge-charge correlations. For all observed quantities, the variance is reduced with the 
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FIG. 8. Integrated autocorrelation times Tint f° r the one dimensional t-J model at quarter band filling on a lattice of 
L = 16 sites with Art = 0.25, and L = 64 sites with Art = 0.125. The results of the plaquette flip algorithm is shown with 
open symbols, the results of the loop algorithm with filled symbols. For the loop-algorithm we took anti-periodic boundary 
conditions, for the plaquette flip algorithm we used the zero-winding boundary condition. Figure (a) shows Tj nt for the internal 
energy, (b) for charge-charge correlations S c at k — Uf = tt/4 and (c) spin-spin correlations S s at k = /cf- The arrows 
denote measurements where r; n t is only a lower bound, see text for details. The loop algorithm gains orders of magnitude 
in computational effort over the traditional plaquette flip algorithm, with increasing gains at low temperatures and for large 
systems. 
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TABLE II. Results for the t-J chain: Comparison of improved (I) and unimproved (U) measurements for the single chain t-J 
model. The measured quantities are the internal energy e, the charge-charge correlations S c (k = 7r/4), the spin-spin correlations 
S 3 (k = 7r/4) and the real-space spin-spin correlations at r — L/8 and r = L/4. We have considered a system with periodic 
boundary conditions and 64 sites, J = t, j3J = 16, Art = 0.125. The number of particles n v are 32 and 48. For comparison, 
we show in the last two rows the results for the Heisenberg chain of the same length (hb), at /3J = 16. We performed 100000 
updates for each simulation. 



model 


alg. 


e 


error 


Sc(k = f ) 


error 


S s (k = f ) 


error 




error 


(SiS i+i ) 


error 


tip 


= 32 


U 


-0.75295 


0.000292 


0.28284 


0.00053 


0.78092 


0.00202 


0.00533 


0.00015 


0.00088 


0.00011 


n p 


= 32 


I 


-0.75254 


0.000277 


0.28238 


0.00045 


0.78012 


0.00161 


0.00535 


0.00012 


0.00086 


0.00008 


rip 


= 48 


u 


-0.81528 


0.000322 


0.23708 


0.00066 


0.71221 


0.00096 


0.01046 


0.00029 


0.00220 


0.00021 


iip 


= 48 


I 


-0.81563 


0.000288 


0.23636 


0.00056 


0.71135 


0.00072 


0.01078 


0.00022 


0.00230 


0.00016 


hb 




u 


-1.38017 


0.00046 






0.67635 


0.00151 


0.04659 


0.00064 


0.00761 


0.00054 


hb 




I 


-1.37965 


0.00037 






0.67480 


0.00105 


0.04739 


0.00053 


0.00797 


0.00044 



TABLE III. Results for the frustrated Heisenberg chain. Improved and unimproved measurements of the sign for the J-J' 
Heisenberg model on 20 sites for J = 10J' for different values of the freezing ratio e and f3J = 0.1 and f3J = 0.2. In the last 
column, we show the ratio of the errors between the improved and the unimproved errors. 



T/J 


e 


improved sign 


error 


unimproved sign 


error 


ratio 


0.1 


0.1 


0.00587 


0.00109 


0.00651 


0.00162 


1.48622 


0.1 


0.2 


0.00535 


0.00075 


0.00570 


0.00133 


1.76066 


0.1 


0.3 


0.00581 


0.00084 


0.00536 


0.00146 


1.74388 


0.1 


0.4 


0.00479 


0.00084 


0.00379 


0.00134 


1.60041 


0.1 


0.5 


0.00444 


0.00105 


0.00390 


0.00148 


1.41069 


0.1 


0.7 


0.00614 


0.00223 


0.00638 


0.00263 


1.17623 


0.2 


0.1 


0.08953 


0.00201 


0.08985 


0.00243 


1.20779 


0.2 


0.2 


0.08681 


0.00151 


0.08632 


0.00195 


1.29734 


0.2 


0.3 


0.08933 


0.00152 


0.08886 


0.00202 


1.32511 


0.2 


0.4 


0.08387 


0.00166 


0.08250 


0.00222 


1.33664 


0.2 


0.5 


0.08564 


0.00219 


0.08465 


0.00256 


1.16866 


0.2 


0.7 


0.08290 


0.00372 


0.08259 


0.00400 


1.07410 



application of improved estimators. The variance of the improved measurements is up to a factor of 1.7 smaller 
than without the use of improved measurements. Note that in the unimproved measurements, we have measured the 
correlation functions from each lattice site. Thus here we can have large self-averaging when summing the correlation 
measurements over the large lattice, which cancels part of the gain from improved estimators. 

In order to investigate the improved estimators for simulations with a sign problem, we have simulated a simple 
frustrated spin system, namely the Heisenberg chain with nearest and next-nearest neighbor interactions. With the 
notations of Eq. ([!]) the Hamiltonian reads 

Hj. r = ^(JS, ■ S i+1 + J' Si ■ S i+2 ) . (42) 

i 

We have implemented this model by the continuous time loop-algorithm. For this model we have to use a finite 
probability e for diagonal graph segments, which implies a finite probability for freezing (see Fig. |j); otherwise the 
algorithm is not ergodic (and no negative sign will appear). The sign problem is very severe here. Even with a 
relatively weak frustrating coupling of J = 10 J' we have obtained (sign) = 0.0054 ± 0.0007 for (3 J = 0.1 on 20 



sites. Note that we are able to reliably measure a sign this small. In Tab. Ill we show the results for improved and 
unimproved measurements of the sign, for different values of the freezing factor e and temperatures of T/J = 0.1 and 
T/J = 0.2. The errors and the improvement due to the improved estimators depends on the value of the freezing 
factor e, in this case the optimal value is e f=a 0.2. The improved estimators perform better as the sign decreases, and 
the ratio of the errors of the improved and the conventional measurements increases as the temperature is lowered. 
Note that the factor of the improvement of 1.75 leads to a reduction of a factor three in CPU time. 
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C. two leg t-J ladder 



As an example of the loop algorithm for the t-J-mode\ beyond a single chain, we show the results of a calculation 
for the magnetic susceptibility of the t-J ladderO In Fig. 0, we show a graphical representation. These are the 



t',J' 



i=1 



o — o 



o — o — o — o 



O a=2 



O a=1 



FIG. 9. Schematic picture of the t-J ladder with two legs and L rungs. The couplings along the rung are J' and t! , those 
along the ladder direction are J and t. 



first QMC calculations for t-J ladder systems. For the Trotter-Suzuki breakup, we have split the Hamiltonian into 
bond-terms, so that again we obtained a model on a checkerboard-like plaquette lattice, and our loop algorithm could 
be applied unchanged. We have performed simulations with two holes and J' = t' = 4J = 4i, where J' and t' are the 
interactions on each rung, and J and t are those along the legs. This parameter regime is dominated by the strong 
coupling limit J' S> J, t. In this limit, we have a simple and intuitive picture, following Refs. The undoped 

ladder consists of weakly coupled singlet pairs formed on the rungs (Fig. |lO|a). A single hole doped in such a ladder 
will stay in either a bonding or anti-bonding orbital on a rung, while the rest of the system will remain unchanged 
(Fig. Mb). The energy of the lower lying bonding orbital is given to first order by the cost of breaking a bond J' and 
a kinetic energy gain of t' along the rung and t along the ladder direction. Two holes on the same rung also break 
one bond J', but their kinetic energy gain is only of the order of 4t 2 /J' (Fig. [LQc). Hence the total energy of two 
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FIG. 10. Graphical representation of the low-lying states of the t-J ladder in the strong coupling limit J' 2> J, t. (a) The 
undoped case with a ground state energy -B(O). (b) A single hole goes either into the bonding or anti-bonding orbital, the 
energy of the ladder with a hole in a bonding orbital is E(0) + J' — t — t in first order, (c) Two holes on a single rung, with 
E(0) + J' in first order, (d) Two holes on different rungs, with an energy E(0) + 2J' — 2t' — 2t. 

unpaired holes in the ladder is of the order of E(0) + 2 J' - 2t' - 2t (Fig. [l^d), while two holes bound on a single rung 
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have an energy of -E'(O) + J' in first order, where E(0) is the energy of the corresponding Heisenberg ladder. We can 
therefore expect that the two holes in the parameter region considered in this simulation remain unpaired and thus 
two of the rungs will stay in a doublet state. The low temperature Curie law is then given by x = 2^|J-^. In Fig. |ll| 
we show our results, which are in excellent agreement with the expected limit of 4Tx/site =1/16 for two unpaired 
holes and 16 rungs as T — > 0. A more physically realistic parameter region is J' = J = t/3 = t'/3. Then two holes 
form a bound state in their groundstate (Fig. [To|d) . Unfortunately the sign problem is much worse in this region. 




FIG. 11. t-J ladder: a) Magnetic Susceptibility of the t-J-ladder with J' = t' = AJ = At and two holes on 16 rungs, for 
temperatures down to T = £'/16 compared to the undoped case (n p = 32). b) average sign for the doped case. 



D. three leg t-J ladder 

Several studies show. t j ha.t t he ladders with an even number of legs behave completely differently than those with 
an odd number of legs.E3EaE3 In this paragraph we will concentrate on the 3-leg t-J-ladder. The couplings along the 
legs (t, J) and the couplings perpendicular to it (t' , J') are assumed to be equal: t = t' and J = J'. 

At low hole doping, the three leg ladder consists of two components: a conducting Luttinger liquid in the channel 
with odd parity under reflection_a.bout the center leg, coexisting with an insulating (i.e. undoped) spin liquid phase 
in the two even-parity channelsEj At small doping, all holes enter the Luttinger liquid and repel each other, while 
the spin liquid remains undoped. 

The energy gap A between the odd-parity channel and the even-parity spin liquid states have been calculated by 
exact diagonalization of very small ladders of only 3x6 sites in Ref. |3^. Using the QMC loop algorithm we are able to 
estimate the energy gap A between odd and even-parity channels for much larger ladders. We have considered 3-leg 
i-J-ladders of 3x40 sites doped with one hole. We have assumed periodic boundary conditions along the ladder and 
set J/t = 0.5. With this choice of parameters, we reach temperatures down to fit — 7. Below this temperature the 
sign is smaller than 0.01. Note that the sign of the MC simulations of these t-J-ladders is not sensitive to the length 
of the ladder, but only to their width, the number of doped holes and the fraction J/t. In the temperature range 
considered, finite size effects for L > 40 are negligible. 

Figure [l^ shows the probability n con t er leg of the single hole to be located on the center leg of the 3-leg ladder. At 
high temperatures the hole is uniformly distributed over the ladder. Therefore the density n CC ntcr leg is equal to 1/3. 
At zero temperature, however, the hole is in the lowest state of the odd-parity channel -and is dominantly on one of 
the outer legs. The density n con ter leg at T = is only n cen tcr leg ~ 0.2 for a 3x8 ladder SB 

At very low, but finite temperatures, other states with the hole in the odd-parity channel also have non vanishing 
weight in the thermal average. As these states all have odd parity, the density ri cen t C r leg is suppressed and clearly 
smaller than 1/3. Fig. [l^ shows that at higher temperatures, T > t/7, n cen ter leg is larger than 1/3. This is caused 
by admixture of higher lying even-parity channel states. The sharp drop of n cen ter leg below T/t = 0.5 shows the 
decreasing weight of the even-parity channel states as T is lowered. The gap A can be estimated from the MC data 
using a simple two band low-energy model: 
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The two lowest lying bands of a 3 x 8 ladder doped with one hole are shown in Fig. 5 of Ref. |35|. The states belonging 
to the lowest (second lowest) lying band are of odd (even) parity. These two bands approximately have cosine forms 
and can be seen as Bloch waves of two different transverse wave functions </>°4ns an d ^trans m a nrs * approximation. 
Then the probability of the hole to be located on the center leg for all states in the odd (even) parity band is constant 
(independent of the wave vector k) and equal to a value Center ("-center) which is determined only by 4>°r&ns (^trans)- 
From the exact diagonalization of a 3 x 8 ladder one sees that the approximation of a k independent n°^ tcl . (Center) 
is valid within 10%, and one gets an estimate: n°^ tcr « 0.2 and Center ~ 0.45E3 Since this situation is not supposed 
to change qualitatively as the length of the ladder gets longer, the low temperature behavior of n CC nter leg can be 
described by this two band model also in the case of a long ladder. Therefore, considering the density of states, the 
expectation value of n centor i eg can be calculated as a function of the temperature T and compared to the MC results. 
From this one can get an estimate of the gap A between the odd and even parity band in a long ladder: 

A«0.25(5)i = 0.5(l)J (J/t = 0.5). (43) 

This fit is shown in Fig. |l2] and is reasonable at low temperatures only. At higher temperatures other bands have to 
be considered too. For this fit of the low energy model results to the MC data all other parameters (^ center' "center 
and the bandwidths of the two bands) are assumed to be equal to those of a 3 x 8 ladder. But even if these paramaters 
are varied in a physically reasonable range, the gap A hardly changes. The *alue obtained for A (Eq. |43j) is bigger 
than that of the 3x6 ladder, obtained by exact diagonalization (A = 0.15i).E3 This difference may result either from 
strong finite size effects in the small clusters or from the fact that the low energy model described above is not so 
precise in the temperature range where it was used for fitting the MC results. 




FIG. 12. Temperature dependence of the probability of the hole to be located on the center leg n ccnt0 r leg of a 3-leg t- J-ladder 
doped with one hole. The filled circles are the QMC loon-data for a 3x40 cluster and the zero temperature value (diamond) is 
calculated for a 3x8 cluster using exact diagonalization.tJ The dashed line shows the fit calculated by a low-energy two band 
model. 



VI. CONCLUSIONS 



In this paper we have introduced a loop algorithm for simulations of t-J type models and discussed the use of 
improved estimators, especially the use of improved estimators for models with a sign problem. 

We found many significant improvements for the loop algorithm compared to previous local updating MC algorithms. 
The loop algorithm is fully ergodic for any geometry of the lattice, without the introduction of any additional updating 
procedure. With the loop algorithm it is possible to perform simulations in the canonical or grand canonical ensemble, 
with fixed (constant winding number) or free magnetization in a natural way. 

The most important improvement of the loop algorithm is certainly the great reduction of the autocorrelation 
time r. We have shown examples in Sec. [v|, where for the parameters studied, the reduction is up to four orders of 
magnitude. This gain will increase further for larger systems and lower temperatures. This huge reduction of the 
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autocorrelation times allows to study much bigger systems at much lower temperatures than before with the same 
amount of computer time. 

The loop algorithm for the t-J model can also be extended to various other models. Different additional terms can 
be incorporated easily into an overall flipping probability of the loops. For some new terms it might be favorable to 
change the weights v(G p ). The loop algorithm is easily adapted to other lattice geometries. This can be done simply 
by changing the underlying geometry of the lattice in the simulation and introducing corresponding additional terms 
in the Trotter decomposition. 

With the loop algorithm it is also possible to perform simulations in the continuous time limit Ar — > (see appendix 
). Therefore, we can eliminate the errors due to the finite time steps Ar without making simulations for different 
values of Ar and extrapolating to Ar = afterwards. Again we can save a large amount of computer time compared 
to discrete time simulations. 

The use of improved estimators further reduces the variance of measured quantities. The introduction of improved 
estimators also for models with a sign problem allows the investigation of many new systems with this method, 
e.g. frustrated spin problems. We have presented t-J ladders and a frustrated Heisenberg model as examples. The 
reduction of the variance by the improved estimators depends very much on the model and the observable under 
consideration. For the systems we have studied here, the improved estimators helped to reduce the variance of the 
observables by about one third. 

Although we can simulate much bigger systems much faster than before with these new techniques, the sign problem 
still remains and limits the application of the loop algorithm to systems where the negative sign problem is not too 
severe. We have shown examples of t-J ladder systems in Sec. |y|. Despite this drawback for higher dimensional 
fermion systems, many new problems that are far beyond the scope of previous local MC techniques can be tackled 
due to the advantages of these new simulation techniques. 
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APPENDIX A: LOOP ALGORITHM IN CONTINUOUS IMAGINARY TIME 

In this appendix we briefly review the main idea behind the continuous time formulation of the loop algorithmEl 
and how it can be used for the t-J model. The continuous time version depends on the fact that the loop algorithm 
is well defined even in the limit Ar — * 0. All the probabilities for choosing graphs G p have well defined values in this 
limit. 

Note that for Ar — > the frequency of a worldline hopping from one site to another tends to a finite limit, because 
the number of time slices where such a hop can occur is proportional to Ar -1 , and the hopping probabilities are 
of the order O(Ar). In the continuous time formulation a configuration is therefore best specified through the time 
values Ti at which the spin configuration changes, as well as the initial configuration at the time r = 0. This way of 
specifying the configuration reduces the memory requirements by about an order of magnitude compared to a discrete 
time implementation at a typical value of Ar. 

We now describe the continuous time limit of the loop construction. In the discrete time implementation we loop 
over all time slices and decide the graph segments for each plaquette on this time slice. 

In the continuous time limit we need a new procedure. We note that the probability for having a graph segment 
which forces the loop to "jump" to another site is proportional to the infinitesimal time step dr: p — Adr. The 
probability for continuing straight on is 1 — O(dr) on the other hand. The situation is therefore equivalent to a 
radioactive decay process with "decay constant" A. This decay constant depends on the spin configuration and can 
change only at the time steps rj where there is a change in the configuration. We have listed the decay constants in 
Figs. |^, [6] and ^ together with the probabilities for finite Ar. 

Instead of deciding at each infinitesimal time step dr if the loop "decays" (i.e. jumps) to another site with probability 
dr we calculate a "decay time" after which the loop "decays" to a neighboring site. As the decay processes to the 
various neighbors of a site are independent we can calculate independent decay times for each of these "decay channels" . 
A special case are the finite number of time points where a world line jumps to a neighbor. These are treated like in 
the discrete time algorithm. There the loop has to jump to the neighboring site. This is called a "forced decay" in 
Ref. [D| 
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The loop flip probabilities also have a well-defined continuous time limit. In substep I they are always 1/2, which 
holds even in the continuous time case. The only nontrivial probabilities are in steps II and III. There are two 
contributions from the weights tu asymm (Cp). The forced decays contribute ratios such as 

e ArJ/2 sh(ArJ/2) _ j 
Wasymm(t-pJ/W aS y mm (CpJ — sh(Art) ~ 2t 

or the inverse of it. For straight worldlines between two decays at n and ti the product over all infinitesimal weights 
has to be considered. The continuous time limit is 

M^oo II ^ymm(C p (ri + j — )). (A2) 



In particular 



i=0 



lim TT e J w c h(jA) = e J(™-n)/2 (A4) 

Thus the forced decays contribute terms like J/2f or 2t/J and the straight pieces just contribute the classical Ising 
weights of the worldline segments. 

While this continuous time algorithm is more complex to implement than the discrete time version, it has two 
significant advantages. One advantage, mentioned already above, is that the memory requirements are reduced by up 
to an order of magnitude, depending on the implementation. This is crucial if one wants to simulate huge systems, 
where memory constraints become the restricting factor. 

The main advantage is however that in the continuous time algorithm there is no systematic error associated with a 
finite time step At. In the discrete time algorithm this systematic error could be controlled by simulating for several 
values of the time step At and then extrapolating to At = 0. In our experience this need to run several simulations 
makes the discrete algorithm about a factor 4-8 slower, depending on the hardware platform and implementation. 



APPENDIX B: IMPROVED ESTIMATORS FOR CORRELATION FUNCTIONS 



In this appendix we show improved estimators for charge and spin correlation functions. First we consider the spin 
correlation function (S^ T S^, T ,) between two spins at sites r and r' and at imaginary times r and t' respectively. The 
improved estimator is 

J2 s; iT (C)s$ t AC) P {C) (bi) 

cer« 

As each spin can be on one loop only, this sum can be simplified substantially. If the two spins are on different loops 
it is 

[(1 - Pmpk + pfliptf] [(1 - p' mp y + p' mp a'] , (B2) 

where a is the value of the T in the original state £jW, and a the value in a state where the loop containing the 
spin is flipped. The flip probability of this loop is given by pfa p . Similarly the primed symbols refer to the other spin. 
If both spins are on the same loop it is 

[(l-pmpW+pflip^'] (B3) 

The equation for the cases where one or both spins have been fixed, either because they are inactive in the t-J model 
algorithm or because the loop has been fixed, are straight forward. 

Let us now make the above estimators more specific. In the case of a pure spin Hamiltonian and in substep I of 
the t-J algorithm we have pmp — \ and o 7 = —a. In this case the improved estimators are very simple, namely 
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TABLE IV. Improved estimators for spin correlations in the t-J model and in pure spin models, for simulations with a sign 
problem, from Eqns. (B6, B7) in the case pflj p = 1/2. 





nnlv lnnn / cliflncps sipti 




qnv nttipr Innn ptiflnp^ps sio'Ti 


Step I or pure spin models 








both spins active 








spins on different loops 


u 


sign(C)<x<r 


U 


spins on same loop 


o 




o 


one or both spins inactive 










Step II and III 








both spins active 








spins on different loops 


±isign(C)(a±i) 
sign(C) [±I(a + a')-H 


sign(C)(a±±)(a'±±) 





spins on same loop 







spin a' inactive 


±sign(C)(a±i)a' 








TABLE V. Improved estimators for charge correlations in substeps II and III of the t-J model algorithm, for simulations 
with a sign problem, from Eqns. (B6, B7), for the case pfj; p = 1/2. (The improved estimators are trivial for substep I.) 





only loop I changes sign 


both loops change sign 


any other loop changes sign 


both spins active 








spins on different loops 


sign(C)i(n-i) 
sign(C)i(n + n' - 1)) 


sign(C)(n-I)(n'-I) 





spins on same loop 







spin n inactive 


sign(C)(n-±)rc' 








, z oz \ _ f 0) if the spins are on different loops 

I r,r r ',r'A«ipr s a(J i ^ j£ ^ ne spins are on the same loop. ^ ' 

(Moreover, for the Heisenberg-antiferromagnet, we have a = +(— )cr / when the spins are located on the same (on 
different) sublattice.) For substep II and III of the algorithm the estimators are slightly different. There the flipping 
probabilities are not equal, and a = ±g — a, as we change up (down) spins into holes and vice versa. In this case 



the improved estimators Eqns. (B2, |B3|) look more complex but can be simplified by fixing some loops so that the 
remaining flipping probabilities are all pfn p = 1/2. The spins on the fixed loops are treated just as inactive spins. 
Similar improved estimators can be used for charge-charge correlations 

J2 n T!T (C)n r ,y{C)p(C) (B5) 
cerw 

with a suitable reassignment of "er" . They are trivial for substep I or pure spin models, since then only spin degrees 
of freedom are changed. For steps II and III the occupation number n changes to n = 1 — n, because these steps 
exchange a hole with an up or down spin. We see that the calculation of improved estimators of correlation functions 
can be performed with similar effort as for the non-improved estimators. 

For simulations with a sign problem similar improved estimators can be derived. For the two-site spin or charge 
correlation functions two different cases have to be distinguished: Both spins are on the same loop I, or they are on 
two different loops 1,1'. If they are on the same loop, the improved estimator is 

[(1 - pflip(0)sign(/)<T(r' + pflip(Z)sign(7)ffCT / ] (B6) 

xs ° x II ((! ~-Pfiip(0) si g n (*) +Pfiip(«)sign(7)). 

loops i^l 



In the second case it is: 



[(1 - Pfii P (l))sign(l)a + pfn p {l)sign(l)a] 

x [(1 - p fli p(Z'))sign(Z'y + Mip (/')sign(7 7 )7j'] (B7) 
xs x Yl ((! --PflipW) si g n W +Mip(«)sign(7)). 



loops i^l,V 



In simulations with a s ign pr oble m it is of advantage to have all flipping probabilities Pfn p (i) = 1/2. Then the last 



term in the Eqns. (B6) and (B7), the product ~ Pflip W) s ig n (*) + Pflip W s ig n (*)); vanishes if one of the loops 
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changes its sign, which m akes the estimators simple. If no loops flip their sign, the improved estimators are equivalent 
to the above ones (Eqns. (B2 , B3)). The only other two cases with nonzero improved estimators occur if one or both 
of the loops goingthrough the spins under consideration change their sign. These improved estimators are presented 
in tables IV and | 
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